Fix critical bugs from comprehensive review + PRS-CS Python validation by gaow · Pull Request #464 · StatFunGen/pecotmr

gaow · 2026-04-07T15:43:12Z

Summary

Fix 3 critical bugs found in comprehensive code review, plus validation of PRS-CS against the original Python implementation (getian107/PRScs).

Bug fixes

1. lassosum_rss() lambda reorder bug (Critical)

result$beta[, order] <- result$beta was a self-assignment that silently produced wrong results. The lambda path is solved in descending order for warm-starting, then "reordered" back — but the reorder was a no-op. Fixed to use inverse permutation via order(order).

2. SDPR thread-safety (Documentation)

sample_assignment() uses a shared std::mt19937 across threads when n_threads > 1, causing a data race. This matches the original SDPR (eldronzhou/SDPR) which has the same issue with gsl_rng. The default n_threads=1 avoids the race. Documented with a clear warning comment.

3. check_ld eigenfix PSD vs PD (Medium)

eigenfix was setting negative eigenvalues to 0, producing a PSD (not PD) matrix. PRS-CS and SDPR require PD for Cholesky decomposition. Fixed to use r_tol (1e-8) as the floor, ensuring strict positive-definiteness.

PRS-CS Python validation

Compared pecotmr C++ PRS-CS against the original Python PRScs (getian107/PRScs) on identical simulated data (n=500, p=50, 10 causal SNPs):

Metric	Value
cor(Python beta, C++ beta)	0.995
Python cor(est, truth)	0.985
C++ cor(est, truth)	0.972
Sigma ratio (C++/Python)	1.014
Top-10 SNP overlap	8/10

Despite different RNG implementations (numpy vs std::mt19937), both converge to essentially the same posterior.

Other review findings (not fixed, lower priority)

gigrnd() clamps result to [0,1] — matches original PRS-CS which does psi[psi>1]=1 post-GIG
phi memory leak risk on exception in PRS-CS — low risk, would need unique_ptr
LD blocks re-read per lambda in lassosum C++ — performance only
exists() search path in otters_weights — works when package is loaded
Float precision in SDPR sample_assignment — matches original SSE single-precision

Test plan

PRS-CS validated against original Python (cor=0.995)
devtools::test() passes
R CMD check passes

🤖 Generated with Claude Code

1. lassosum_rss() lambda reorder bug (R/regularized_regression.R): Self-assignment result$beta[, order] <- result$beta was a no-op. Fixed to use inverse permutation: result$beta[, order(order)]. Same fix for conv, loss, fbeta vectors. 2. SDPR thread-safety documentation (src/sdpr_mcmc.cpp): Document the data race on shared std::mt19937 when n_threads > 1. This matches the original SDPR which has the same issue. Default n_threads=1 avoids the race. 3. check_ld eigenfix now produces PD not just PSD (R/LD.R): Set negative eigenvalues to r_tol (default 1e-8) instead of 0, ensuring Cholesky decomposition succeeds in PRS-CS/SDPR. Setting to exactly 0 gave PSD which is not sufficient for Cholesky. Also verified via PRS-CS comparison: - cor(Python PRScs, pecotmr C++) = 0.995 on beta estimates - Both recover identical signal (cor with truth ~0.97-0.98) - Sigma estimates match within 1.4% Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

PRS-CS: Replace heap allocation (new/delete) of phi with stack variables in both prs_cs_mcmc() and the Rcpp wrapper. The old code leaked memory if an exception was thrown between new and delete (e.g., Cholesky failure, user interrupt). Now uses phi_local on the stack, eliminating the leak risk entirely. lassosum: Document that min(fbeta) model selection is adequate vs pseudovalidation. Empirical comparison over 20 random trials shows no systematic advantage for either approach (4 vs 6 wins, 10 ties). The s-grid search provides the primary regularization; lambda selection within each s has minimal impact. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

1. pval_acat: handle very small p-values (< 1e-15) using asymptotic approximation tan(pi*(0.5-p)) ~ 1/(pi*p) to avoid Inf/NaN overflow. 2. otters_weights: remove global b-clamping that biased all methods. Move cor>=1 safeguard into lassosum_rss_weights() where it belongs, since only lassosum requires correlations strictly in (-1,1). PRS-CS and SDPR handle their own regularization. 3. check_ld eigenfix: trigger on !is_pd (not just !is_psd), so matrices with eigenvalues exactly 0 are also repaired to be strictly positive definite for Cholesky decomposition. 4. lassosum C++: cache LD blocks in std::vector<arma::mat> before the lambda loop instead of re-copying from Rcpp::List on every iteration. Uses const reference in inner loop. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

gaow and others added 4 commits April 7, 2026 11:42

Update documentation

ee474ca

gaow merged commit 65a683f into main Apr 7, 2026
4 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix critical bugs from comprehensive review + PRS-CS Python validation#464

Fix critical bugs from comprehensive review + PRS-CS Python validation#464
gaow merged 4 commits intomainfrom
fix-review-bugs

gaow commented Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gaow commented Apr 7, 2026

Summary

Bug fixes

PRS-CS Python validation

Other review findings (not fixed, lower priority)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant